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ABSTRACT 


PCI Express is a high-speed serial connection that operates more like a network than a bus. PCI Express will serve as a general purpose I/O 
interconnects for a wide variety of future computing and communications platforms. PCI Express (PCle) is implemented with a split-transaction protocol 
that provides more bandwidth and is compatible with existing operating systems. PCI Express has three discrete logical layers: the Transaction Layer, the 
Data Link Layer, and the Physical Layer. This paper analyze and simulates the function of Transaction layer IP core in the System Level with top-down 
design method, wrote the codes to implement Transaction Layer using Very high speed hardware description language (VHDL) and provided the 
simulation results using Active HDL Simulation tool. The simulation result shows that the designed IP core meets the required protocol specifications for 


the proper functioning of PCI Express Transaction layer. 
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1. INTRODUCTION 


Pol Express is a high-speed serial connection that operates more like a network 
than a bus. PCI Express will serve as a general purpose I/O interconnects for a wide 
variety of future computing and communications platforms. Instead of one bus that 
handles data from multiple sources, PCle has a switch that controls several point-to- 
point serial connections. Peripheral Component Interconnect (PCI) slots are such an 
integral part of a computer's architecture that most people take them for granted. For 
years, PCI has been a versatile, functional way to connect sound, video and network 
cards to a motherboard. Key PCI attributes, such as its usage model and software 
interfaces are maintained whereas its bandwidth-limiting, parallel bus implementation 
is replaced by a long-life, fully-serial interface. A  split-transaction protocol is 
implemented with attributed packets that are prioritized and optimally delivered to 
their target. 

The PCI Express Architecture is specified in three logical layers as shown in 
figure 1. The PCI model with load-store architecture with a flat address space is 
maintained to provide compatibility to all existing applications and drivers. Transaction 
layer, Data link layer and physical layer form the basic architecture. Each of these 
layers process the information or data being transmitted and received and are 
subdivided into two sections accordingly.PCl Express components communicates 
with each other through packets which carries the information. These Packets are 
generated in the Transaction and Data Link Layers. The physical layer transports 
packets between the link layers of two PCI Express agents. 
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PCI Express™ Architecture showing logical layers 


The primary role of a link layer is to ensure reliable delivery of the packet across the PCI Express link. The transaction layer receives read and write 


requests from the software layer and creates request packets for 
transmission to the link layer. As the transmitted packets flow through the 
other layers, they are extended with additional information necessary to 
handle packets at those layers. 


2. TRANSACTION LAYER 


The transaction layer is the top Layer of PCI Express architecture. It is 
responsible for the assembly and disassembly of data Packets used for 10 
communication. The transaction layer receives read and write requests 
from the software layer and creates request packets for transmission to 
the link layer. These Data packets are called TLP’s (Transaction layer 
packets). The Transaction Layer is also responsible for managing credit- 
based flow control for TLPs. All requests are implemented as split 
transactions and some of the request packets will need a response 
packet. Each packet has a unique identifier that enables response 
packets to be directed to the correct originator. TLP’s supports either 
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Table.1 PCI Express Address Space and Transaction Types 


Transaction Types 


Purpose 
Read. Write Transfer data to or from a location in the 
system memorymap 

Transfer data to or from a location in the 
system IO map 

Transfer data to or from a location in the 
configuration space of a PCI-compatible 
device. 

(General in-band messaging and event 
reporting (without consuming memory or IO 
address resources) 


Read, Write 


Read, Write 


Baseline. V endor- 
specific 
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32bit memory addressing or extended 64bit memory addressing. The 
transaction layer supports four address spaces: it includes the three 
PCI address spaces memory, I/O, configuration and adds a Message 
Space to support all prior side-band signals, such as _ interrupts, 
power-management requests, resets, and so on, as_ in-band 
Messages. The basic use of each address space is shown in Table.1. 


2.1. Transaction layer packet Format 


control for TLPs. All requests are implemented as split transactions. 
Accesses to the four address spaces in PCI Express are 
accomplished using split-transaction requests and completions. Three 
kinds of packets Posted, Non-Posted and Completion (Cpl) are 
responsible for transactions handled by the transaction layer. Each 
Transaction Layer Packet contains a three or four double word (12 or 
16 byte) header. Figure 2 shows the complete view of a TLP Four 
double word format. Included in the 3DW or 4DW header are two 
fields, Type and Format (Fmt), which define the format of the 
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entire TLP as it moves between devices in the PCI Express topology. Generic 4DW Header TLP Format 


2.2. Overview of TLP Header Information 
As TLPs arrive at an ingress port, they are first checked for errors at both the 
physical and data link layers of the receiver. Assuming there are no errors, 
TLP routing is performed; basic steps include: 
e The TLP header Type and Format fields in the first DWord are examined 
to determine the size and format of the remainder of the packet 
e Depending on the routing method associated with the packet, the device 
will determine if it is the intended recipient; if so, it will accept (consume) 
the TLP. If it is not the recipient, and it is a multi-port device, it will forward 
the TLP to the appropriate egress port--subject to the rules for ordering 
and flow control for that egress port 
e If it is neither the intended recipient nor a device in the path to it, it will 
generally reject the packet as an Unsupported Request (UR). 
Table.2 below summarizes the encodings used in TLP header Type and 
Format fields. These two fields, used together, indicate TLP format and 
routing to the receiver. 


2.3. Split Transaction Protocol 

Accesses to the four address spaces in PCI Express are accomplished using 
split-transaction requests and completions. The split transaction protocol is an 
improvement over earlier bus protocols (e.g. PCI) which made extensive use 
of bus wait-states or delayed transactions (retries) to deal with latencies in 
accessing targets. In PCI Express, the completion following a request is 
initiated by the completer only when it has data and/or status ready for 
delivery. The fact that the completion is separated in time from the request 
which caused it also means that two separate TLPs are generated, with 
independent routing for the request TLP and the Completion TLP. Note that 
while a link is free for other activity in the time between a request and its 
subsequent completion, a split-transaction protocol involves some additional 
overhead as two complete TLPs must be generated to carry out a single 
transaction. 


2.4. Virtual Channel (VC) and Transaction ordering 
The VC mechanism provides support for carrying, throughout the fabric, traffic 


Table.2 TLP Header Type and Format Field encoding 


TLP 


FMT{1:0] 


TYPE [4:0] 


Memory Read Request (MRd) 00 = 3DW, no data 01 = 4DW, 0 0000 


Memory Read Lock Request 
(MRdLk) 


no data 


00 = 3DW, no data 01=4DW, 0 0001 


no data 


Memory Write Request (Mir) |10 = 3DW, w data 11=4DW, (0 0000 


10 Read Request (IORd) 

10 Write Request (OWr) 
Config Type 0 Read Request 
(CfgRd0) 

Config Type 0 Write Request 
(CigW10) 

Config Type 1 Read Request 
(CfgRdl) 

Config Type 1 Write Request 
(CfgWr1) 

Message Request (sg) 


Message Request W/Data 
(MD) 


Completion (Cpl) 
Completion W Data (CpiD) 
Completion-Locked (CpiLk) 
Completion WData (CpIDLk) 


w dita 
00 = 3DW, no data 
10 =3DW. wi data 
00 = 3DW, no data 


10 =3DW, wi data 
00 = 3DW, no data 
10 = 3DW, wi data 


01 =4DW, no data 


11=4DW, wi data 


00 = 3DW, no data 
10=3DW, wi data 
00 = 3DW, no data 
10 =3DW, wi data 


(00010 
(0 0010 
0.0100 


00100 
00101 
00101 


1 0 RRR* (for 
RRR, see routing 
subfield) 


1 0 RRR* (for 
RRR, see routing 
subfield) 


0 1010 
0 1010 
01011 
01011 


that is differentiated using TC labels. The foundation of VCs is independent fabric resources (queues/buffers and associated control logic). These 
resources are used to move information across Links with fully independent flow-control between different VCs. Traffic is associated with VCs by mapping 


packets with particular TC labels to their corresponding VCs. Table 1 defines the ordering requirements for PCI Express Transacti 


in this table apply uniformly to all types of Transactions on PCI Express 
including Memory, I/O, Configuration, and Messages. The ordering rules 
defined in this table apply within a single Traffic Class (TC). There is no 
ordering requirement among transactions with different TC labels. Note that 
this also implies that there is no ordering required between traffic that flows 
through different Virtual Channels since transactions with the same TC label 
are not allowed to be mapped to multiple VCs on any PCI Express Link. For 
Table 3, the columns represent a first issued transaction and the rows 
represent a subsequently issued transaction. The table entry indicates the 
ordering relationship between the two transactions. In order to obtain higher 
efficiency, Flow Control (FC) is used to prevent overflow of Receiver buffers 
and to enable 

Compliance with the ordering rules, there are six types of information tracked 
by Flow Control for each Virtual Channel: Posted Header (PH), Posted 
Data(PD),Non-Posted Header (NPH) , Non-Posted Data(NPD) ,Completion 
Header(CplH) , Completion Data(CpID),Each Virtual Channel maintains an 
independent Flow Control credit pool. The unit of Flow Control credit is 4 DW 
for data. 


3. SYSTEM ARCHITECTURE AND DESIGN 


Receive layer, Configuration space and Transmit layer are the three modules 
that form the basic architecture of PCI express. Every receive layer consists 
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Table 3 Ordering rules summary 


ons. The rules defined 


Posted Request 


Non-Posted Request 


Memory Write or 
Row Pass Column? | Message Request 
(Col 2) 


Read VO or 
Request Configuration 
(Col 3) Write Request 
(Col 4) 


Memory Write 
or Message 
Request 

(Row A) 


Posted 
Request 


Yes 


Read Request 
(Row B) 


VO or 
Configuration 
Write Request 
(Row C) 


ge 


Read 
Completion 
(Row D) 


VO or 
Configuration 
Write 
Completion 
(Row E) 
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of eight VC’s VCo- VC7 among them VCo is necessary while others are 

optional. The basic architecture of PCI express is shown in figure 3. Each 

VC is sub divided into two modules: Receive buffer and Controller. Each 

receive buffer is subdivided into six parts in order to store PH, PD, NPH, ia 
NPD, CplH, and CpID respectively. Header stores no more than 128 

credits data stored no more than 2047 credits. The size of credits is 
indicated by FC credits. Controller implement the function of FIFO,FC and 
Sequencing (ordering).There must be sequence strictly to the packet of 
Posed and Cpl ,according to the rule of transmit ordering. 

Transmit Layer added VC Arbitration and Digest module to compare with 
Receive layer. Transmit Layer is not only ordering for Posed and Cpl 
packet, but also detect Flow Control. 

e Interface Signal to Software Layer and handshake Sequence rules: 

desc_n[127:0] indicate the VCn transmit header and data; 
data_n[127 : 0] indicate the VCn Receive header and data. Rules of _ patatink layer 
Handshake in Header File transmit and receive: the sender set req_n 
to indicate RTS (Request To Send), the Receiver receives the 
request then set ack_n ; Rules of Handshake in Data file transmit 
and receive : the sender set dfr_n to indicate the beginning of data Se 
transmit, and set ws_n to inform Receiver wait when Receiver is busy 
in the data transmit process, next data will be send until ws_n 
clear, it is time to clear dfr_n which inform receiver data send 
finished when the last 1DW data was sent. 
Interface Signal to Data Link Layer and handshake Sequence rules: desc_data [31 : 0] are TLP signal which come from Data Link Layer, The 
corresponding part of Frame represent frame header. BlockT indicate Replay state of Data Link Layer, and also indicate to pause the transmit TLP. 
FC credits n indicate the FC credits value which includes cred_alloc_p, cred_alloc_np, cred_alloc_cpl, cl_p, cl_np, cl_cpl that needed transmit or 
refresh. vc_reqn is set when VCn needed transmit data to DLL, vc_ack_n is set when Arbitrator permit this request ,;when VC received vc_ack_n 
,then set vc_get_n . When last data was sent vc_get_n will be cleared at the same time, and then inform Arbitrator to end the VC permission. Its 
configure register include conf_addr, conf_data_i, conf_data_o, conf_wr, conf_rd are configure address, data in , data out, read and write signal 
respectively,all of them are used to configure Arbitrator configure register, such as to configure VC arbitrator mode 


4. TRANSMITTER AND RECEIVER FLOW CHART 
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Transaction layer architecture 
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Transmitter Flow chart Receiver flow chart 


The transmitter flowchart is shown in Figure 4.The only different between transmitter flowchart and receiver flowchart is the former needed to add in flow 
control, so pnpc_rd also depends on the flow control. if c_ph[7] is 1shows the other side is already full. cl_p stands for CredAlloc of P ; and cc_p stands 
for Credits Consumed of P. Flowchart (d) is virtual channel flow. In the initial state, if the data link layer is not block and vc_req is valid, now vc value will 
update to next_vc value, and choose next channel. The selection of channel is decision by arbitrate mode. Unnecessary vc may be chosen owing to the 
arbitrator support look-up table, so we must to judge whether the vc_req_n of this channel is 1 or not at this moment. Vc_get_n will clear when the virtual 
channel data transmission the last 1DW, and let vc_arb return to initial state. Before the arbitration we should configure VC registers in order to choose 
the mode of arbitration. 

Figure 5 shows Receiver flowchart, there are four sub-flowcharts: (a) is read flowchart, in this flowchart it is first to judge pnpc_rd is 
read_P, read_NP or read_Cpl TLPs at initial state, if pnpc_rd is 3’b100 then set read_p to start flowchart (b) and (c) which transmit PH and PD.Length 
register will record the remainder data in order to judge whether the data completely transmit. When flowchart (b) and (c) are finished, Rx_rd_st will return 
to initial state. When Data Link Layer transmit packets, Frame Synchronous signals will send, flowchart (d) start to receive data at the same time,St1 will 
judge next state is P. NP or Cpl TLP by judge pnpc_wr . Then the header and data shall be written. Length Register is to judge whether data are 
completely transmission. Write NP, Cpl TLPs and P TLPs roughly as the same. 


5. SIMULATION RESULTS 


Transmitter and Receiver flowcharts are simulated for their functional verification using Active HDL Simulation tool. The figure 6 and figure 7 shows the 
simulation results for receiver and transmitter virtual channels respectively. In figure 6 the different transactions made by the receiver virtual channel is 
divided by a vertical line at every 50ns of time,for example 0 to 50ns line indicates the initialization of various buffers and the line from 350ns to 400ns 
indicates the transaction of receiving data from data link layer. The figure 7 shows various transactions made by the transmitter virtual channel. For 
convenience every transaction is separated with a vertical line at every multiple of 50ns.Transactions include header and data read for posted, Non- 
posted and completion transactions. 
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Simulation diagram of Receiver Virtual channel Simulation diagram of Transmitter Virtual channel 


6. CONCLUSION 


In this paper we presented the method of implementing the widely adopted higher performance PCI Express interconnect transaction layer with a top 
down design method, and wrote the VHDL codes to implement it. The simulation results show that the design can achieve the basic function of PCI 
Express Transaction Layer that meet the protocol of PCI Express™ Base Specification Revision 2.0. The same basic design methods can be used to 
improve the equipment transmission bandwidth by further implementing the PCI Express IP Core. 
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